Time Series Data Modeling: Patterns for Temporal Analytics

Time series data - measurements and events ordered by time - requires specialized modeling techniques. Learn the patterns for storing, querying, and analyzing time series data for analytics and AI applications.

8 min read·

Time series data consists of data points indexed in time order - typically measurements or events captured at regular or irregular intervals. Stock prices, server metrics, sensor readings, daily sales, and user activity logs are all time series data. The temporal ordering is not just metadata but the essential structure around which analysis revolves.

Modeling time series data correctly enables powerful temporal analytics - trend detection, seasonality analysis, anomaly identification, and forecasting. Poor time series modeling leads to inefficient queries, incorrect temporal calculations, and AI systems that misinterpret temporal patterns.

Time Series Characteristics

Regular vs. Irregular Intervals

Regular (fixed interval): Measurements at consistent time intervals

  • Server metrics every 60 seconds
  • Daily closing stock prices
  • Hourly weather readings

Irregular (variable interval): Events occurring at variable times

  • User login events
  • Sales transactions
  • Support ticket creation

Regular series enable simpler aggregation and gap detection. Irregular series require more careful handling of time windows.

Single vs. Multiple Series

Univariate: One measurement over time

  • Total website traffic per hour

Multivariate: Multiple related measurements over time

  • CPU usage, memory usage, and network traffic together

Multiple independent series: Many separate time series in one model

  • Metrics for each of 10,000 servers
  • Sales for each of 500 products

Density and Cardinality

High cardinality: Many distinct series (one per user, per sensor) High density: Frequent measurements within each series High volume: High cardinality * high density = massive data volumes

These characteristics drive storage and query optimization decisions.

Core Modeling Patterns

Wide Table Format

Each timestamp is a row, each series is a column:

CREATE TABLE daily_metrics (
  date DATE PRIMARY KEY,
  revenue DECIMAL(12,2),
  orders INT,
  visitors INT,
  signups INT
);
daterevenueordersvisitorssignups
2024-02-0112500145890052
2024-02-0211200132810048

Pros:

  • Intuitive structure
  • Easy to query multiple metrics together
  • Works well for dashboards

Cons:

  • Schema changes needed for new metrics
  • Doesn't scale to high cardinality
  • Sparse series waste storage

Narrow (Long) Table Format

Each measurement is a row:

CREATE TABLE metrics (
  timestamp TIMESTAMP,
  metric_name VARCHAR(100),
  metric_value DECIMAL(20,6),
  PRIMARY KEY (timestamp, metric_name)
);
timestampmetric_namemetric_value
2024-02-01revenue12500
2024-02-01orders145
2024-02-01visitors8900

Pros:

  • Flexible - add new metrics without schema changes
  • Handles sparse data efficiently
  • Scales to high cardinality

Cons:

  • Queries across metrics require pivoting
  • More rows = more storage overhead per value
  • Cross-metric calculations require self-joins

Entity-Metric Format

Combine entity identification with narrow format:

CREATE TABLE entity_metrics (
  timestamp TIMESTAMP,
  entity_type VARCHAR(50),
  entity_id VARCHAR(100),
  metric_name VARCHAR(100),
  metric_value DECIMAL(20,6),
  PRIMARY KEY (timestamp, entity_type, entity_id, metric_name)
);
timestampentity_typeentity_idmetric_namemetric_value
2024-02-01 10:00serverweb-01cpu_usage45.2
2024-02-01 10:00serverweb-01memory_usage78.5
2024-02-01 10:00serverweb-02cpu_usage32.1

This handles both multiple entities and multiple metrics per entity.

Partitioned Time Series

Partition by time for large-scale data:

CREATE TABLE sensor_readings (
  reading_time TIMESTAMP,
  sensor_id VARCHAR(50),
  temperature DECIMAL(5,2),
  humidity DECIMAL(5,2),
  pressure DECIMAL(7,2)
)
PARTITION BY RANGE (reading_time);

-- Monthly partitions
CREATE TABLE sensor_readings_2024_01 PARTITION OF sensor_readings
  FOR VALUES FROM ('2024-01-01') TO ('2024-02-01');
CREATE TABLE sensor_readings_2024_02 PARTITION OF sensor_readings
  FOR VALUES FROM ('2024-02-01') TO ('2024-03-01');

Benefits:

  • Query pruning - only scan relevant time ranges
  • Efficient data lifecycle - drop old partitions
  • Parallel loading into different partitions

Temporal Query Patterns

Point-in-Time Queries

Get values at specific timestamps:

SELECT metric_name, metric_value
FROM metrics
WHERE timestamp = '2024-02-01 00:00:00';

For irregular series, get closest value:

SELECT metric_value
FROM metrics
WHERE metric_name = 'price'
  AND timestamp <= '2024-02-01 12:00:00'
ORDER BY timestamp DESC
LIMIT 1;

Time Range Aggregations

Aggregate over time windows:

-- Daily aggregates from minute-level data
SELECT
  DATE_TRUNC('day', timestamp) as day,
  AVG(value) as avg_value,
  MAX(value) as max_value,
  MIN(value) as min_value
FROM metrics
WHERE metric_name = 'cpu_usage'
  AND timestamp >= '2024-02-01'
  AND timestamp < '2024-02-08'
GROUP BY DATE_TRUNC('day', timestamp)
ORDER BY day;

Time Window Functions

Calculations relative to each point in time:

-- 7-day rolling average
SELECT
  date,
  value,
  AVG(value) OVER (
    ORDER BY date
    ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
  ) as rolling_7_day_avg
FROM daily_metrics;

-- Period-over-period comparison
SELECT
  date,
  value,
  LAG(value, 7) OVER (ORDER BY date) as value_7_days_ago,
  value - LAG(value, 7) OVER (ORDER BY date) as week_over_week_change
FROM daily_metrics;

Gap Detection and Filling

Identify missing data points:

-- Find gaps in hourly data
WITH expected_times AS (
  SELECT generate_series(
    '2024-02-01 00:00:00'::timestamp,
    '2024-02-07 23:00:00'::timestamp,
    '1 hour'::interval
  ) as expected_time
),
actual_times AS (
  SELECT DISTINCT DATE_TRUNC('hour', timestamp) as actual_time
  FROM metrics
  WHERE metric_name = 'requests'
)
SELECT expected_time
FROM expected_times e
LEFT JOIN actual_times a ON e.expected_time = a.actual_time
WHERE a.actual_time IS NULL;

Fill gaps with interpolation or carry-forward:

-- Forward-fill missing values
SELECT
  t.time_point,
  COALESCE(
    m.value,
    LAST_VALUE(m.value IGNORE NULLS) OVER (ORDER BY t.time_point)
  ) as value_filled
FROM time_spine t
LEFT JOIN metrics m ON t.time_point = m.timestamp AND m.metric_name = 'price';

Time Spine Tables

A time spine (or date dimension) provides a complete set of time points for joins:

CREATE TABLE time_spine (
  time_point TIMESTAMP PRIMARY KEY,
  date DATE,
  hour INT,
  day_of_week VARCHAR(10),
  month INT,
  quarter INT,
  year INT,
  is_weekend BOOLEAN,
  is_holiday BOOLEAN
);

-- Populated with every minute/hour/day in your data range
INSERT INTO time_spine
SELECT
  ts,
  ts::date,
  EXTRACT(HOUR FROM ts),
  ...
FROM generate_series(
  '2020-01-01'::timestamp,
  '2030-12-31'::timestamp,
  '1 day'::interval
) ts;

Time spines enable:

  • Gap detection and filling
  • Consistent time-based aggregation
  • Joining irregular events to regular time grids
  • Calendar-aware calculations

Downsampling and Retention

High-frequency data accumulates rapidly. Manage with tiered retention:

retention_policy:
  raw_data: 7_days          # Full second-level detail
  minute_aggregates: 30_days # 1-minute rollups
  hourly_aggregates: 1_year  # Hourly summaries
  daily_aggregates: 5_years  # Daily summaries

Implement with materialized aggregates:

-- Hourly aggregate table
CREATE TABLE metrics_hourly AS
SELECT
  DATE_TRUNC('hour', timestamp) as hour,
  metric_name,
  AVG(metric_value) as avg_value,
  MAX(metric_value) as max_value,
  MIN(metric_value) as min_value,
  COUNT(*) as sample_count
FROM metrics
GROUP BY DATE_TRUNC('hour', timestamp), metric_name;

AI Analytics Considerations

Time Zone Handling

Time zones cause subtle errors. Be explicit about:

  • What timezone raw data is stored in (prefer UTC)
  • What timezone business users expect in reports
  • How to handle entities across time zones

Document in semantic layer:

column:
  name: event_timestamp
  storage_timezone: UTC
  display_timezone: user_local
  business_day_cutoff: "00:00 US/Eastern"

Seasonality Awareness

AI systems need to understand cyclical patterns:

  • Daily patterns (peak hours, overnight lulls)
  • Weekly patterns (weekday vs. weekend)
  • Monthly patterns (end-of-month spikes)
  • Annual patterns (holidays, seasons)
metric:
  name: daily_active_users
  seasonality:
    weekly: significant   # Weekend dips expected
    annual: moderate      # Summer slowdown, holiday spikes
  comparison_recommendation: same_day_previous_week

Timestamp Precision

Document expected precision and how AI should handle queries:

table:
  name: server_metrics
  timestamp_precision: second
  expected_interval: 60_seconds
  gap_handling: interpolate_linear

This prevents AI from misinterpreting gaps or generating unrealistic precision in results.

Period Boundary Clarity

"Last month" is ambiguous:

  • Calendar last month (February)?
  • Last 30 days?
  • Last 4 weeks?
  • Fiscal last month?
time_period_definitions:
  last_month:
    type: calendar_month
    timezone: US/Eastern
  last_30_days:
    type: rolling
    days: 30
    inclusive_today: false

AI systems use these definitions rather than guessing.

Performance Optimization

Indexing Strategies

  • Timestamp index: Required for all time-based queries
  • Composite indexes: (timestamp, metric_name) for filtered queries
  • Covering indexes: Include commonly selected columns
  • Partial indexes: Index only recent data that's queried frequently

Pre-aggregation

Compute common aggregations ahead of time:

-- Materialized daily summary
CREATE MATERIALIZED VIEW daily_summary AS
SELECT
  date,
  SUM(revenue) as total_revenue,
  COUNT(DISTINCT customer_id) as unique_customers
FROM transactions
GROUP BY date;

-- Refresh daily
REFRESH MATERIALIZED VIEW daily_summary;

Columnar Storage

Time series queries typically scan many rows but few columns. Columnar storage (Parquet, columnar database engines) provides significant performance benefits.

Compression

Time series data compresses well due to:

  • Sequential similarity (adjacent values often similar)
  • Repeating patterns (same metric names)
  • Predictable types (timestamps, numbers)

Use columnar formats and database compression features to reduce storage 10-100x.

Time series modeling requires balancing flexibility with query performance. Choose storage formats based on cardinality and query patterns. Implement downsampling for data lifecycle management. Document temporal semantics clearly in the semantic layer so AI systems understand time-based data correctly.

Questions

Time series data is specifically ordered by time, with analysis focusing on temporal patterns - trends, seasonality, correlations over time. Transactional data may have timestamps but is analyzed primarily by business dimensions (customer, product, region). The same data can be both - sales transactions are transactional data but can also be analyzed as a time series.

Related